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Abstract 

We investigate geometrical properties of the random i^-satisfiability problem using 
the notion of x-satisfiability: a formula is x-satisfiable is there exist two SAT as- 
signments differing in Nx variables. We show the existence of a sharp threshold for 
this property as a function of the clause density. For large enough K, we prove that 
there exists a region of clause density, below the satisfiability threshold, where the 
landscape of Hamming distances between SAT assignments experiences a gap: pairs 
of SAT-assignments exist at small x, and around x = ^, but they do not exist at in- 
termediate values of x. This result is consistent with the clustering scenario which is 
at the heart of the recent heuristic analysis of satisfiability using statistical physics 
analysis (the cavity method), and its algorithmic counterpart (the survey propa- 
gation algorithm). Our method uses elementary probabilistic arguments (first and 
second moment methods), and might be useful in other problems of computational 
and physical interest where similar phenomena appear. 
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1 Introduction and outline 



Consider a string of Boolean variables — or equivalently a string of spins — of 
size N: a = {(jj} G { — 1, 1}^. Call a i^-clause a disjunction binding K of these 
Boolean variables in such a way that one of their 2^ joint assignments is set 
to FALSE, and all the others to true. A formula in a conjunctive normal form 
(CNF) is a conjunction of such clauses. The satisfiability problem is stated 
as: does there exist a truth assignment a that satisfies this formula? A CNF 
formula is said to be satisfiable (SAT) if this is the case, and unsatisfiahle 
(UNSAT) otherwise. 

The satisfiability problem is often viewed as the canonical constraint satisfac- 
tion problem (CSP). It is the first problem to have been shown NP-complete 
[sl, i.e. at least as hard as any problem for which a solution can be checked in 
polynomial time. 

The P 7^ NP conjecture states that no general polynomial-time algorithm 
exists that can decide whether a formula is SAT or UNSAT. However for- 
mulas which are encountered in practice can often be solved easily. In order 
to understand properties of some typical families of formulas, one introduces 
a probability measure on the set of instances. In the random i^-SAT prob- 
lem, one generates a random i^-CNF formula Fk{N, M) as a conjunction of 
M = Na i^'-clauses, each of them being uniformly drawn from the 2^ (^^^ 
possibilities. In the recent years the random /^-satisfiability problem has at- 
tracted much interest in computer science and in statistical physics. Its most 
striking feature is certainly its sharp threshold. 

Throughout this paper, 'with high probability' (w.h.p.) means with a proba- 
bility which goes to one as ^ oo. 

Conjecture 1.1 (Satisfiability Threshold Conjecture) For all K > 2, 

there exists ac{K) such that: 

• if a < ac{K), Fx{N, Na) is satisfiable w.h.p. 

• if a > ac{K), Fk{N, Na) is unsatisfiahle w.h.p. 

The random fC-SAT problem, for large and a close to a^K), provides 
instances of very hard CNF formulas that can be used as benchmarks for 
algorithms. For such hard ensembles, the study of the typical complexity could 
be crucial for the understanding of the usual 'worst-case' complexity. 

Although Conjecture [LT] remains unproved, Friedgut established the existence 



of a non-uniform sharp threshold [12 



Theorem 1.2 (Friedgut) For each K > 2, there exists a sequence a]^{K) 
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such that for all e > 0; 



lim P{Fk{N, Na) is satisfiable) 

N^oo 




;i + e)«^(i^). 



A lot of efforts have been devoted to finding tight bounds for the threshold. 
The best upper bounds so far were derived using first moment methods 1^, ]3 . 
and the best lower bounds were obtained by second moment methods 17. isl . 
Using these bounds, it was shown that ac{K) = 2^ ln(2) — 0{K) as — *• oo. 

On the other hand, powerful, self-consistent, but non-rigorous tools from sta- 
tistical physics were used to predict specific values of ac{K), as well as heuris- 



tical asymptotic expansions for large K [20|, |2l|, |22j. The cavity method [19 
which provides these results, relies on several unproven assumptions moti- 
vated by spin-glass theory, the most important of which is the partition of 
the space of SAT-assignments into many states or clusters far away from each 
other (with Hamming distance greater than cN as N ^ oo), in the so-called 
hard- SAT phase. 



So far, the existence of such a clustering phase has been shown rigorously in the 
simpler case of the random XORSAT problem 33|, |32|, |3J| in compliance with 
the prediction of the cavity method, but its existence is predicted in many 
other problems, such as g-colorability 27|, |28|] or the Multi-Index Matching 
Problem 29|]. At the heuristic level, clustering is an important phenomenon, 
often held responsible for entrapping local search algorithm into non-optimal 
metastable states 26|. It is also a limiting feature for the belief propagation 



iterative decoding algorithms in Low Density Parity Check Codes [30|, l31 



In this paper we provide a rigorous analysis of some geometrical properties of 
the space of SAT-assignments in the random /T-SAT problem. This study com- 
plements the results of 35[, and its results are consistent with the clustering 
scenario. A new characterizing feature of CNF formulas, the 'x-satisfiability', 
is proposed, which carries information about the spectrum of distances be- 
tween SAT-assignments. The x-satisfiability property is studied thoroughly 
using first and second moment methods previously developed for the satisfia- 
bility threshold. 



The Hamming distance between two assignments (cr, r) is defined by 

N 1 ^ 

d'fr = — - ^i'^i ■ (2) 

^ ^ i=l 

(Throughout the paper the term 'distance' will always refer to the Hamming 
distance.) Given a random formula Fk{N, Na), we define a 'SAT-x-pair' as a 
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pair of assignments (a, f ) G { — 1, 1}^^, which both satisfy F, and which are 
at a fixed distance specified by follows: 

dg^ e [Nx - e{N), Nx + e{N)]. (3) 

Here x is the proportion of distinct values between the two configurations, 
which we keep fixed as and d go to infinity. The resolution e(A^) has to be > 1 
and sub-extensive: \im]\i^oo ^{N) / N = 0, but its precise form is unimportant 
for our large N analysis. For example we can choose e(A^) = 

Definition 1.3 A CNF formula is x-satisfiable if it possesses a SAT-x-pair. 

Note that for x = 0, x-satisfiability is equivalent to satisfiability, while for 
X = 1, it is equivalent to Not- All-Equal satisfiability, where each clause must 
contain at least one satisfied literal and at least one unsatisfied litteral Idl. 



The clustering property found heuristically in 21, 2^ suggests the following: 

Conjecture 1.4 For all K > Kq, there exist ai{K), a2{K), with ai{K) < 
a2{K), such that: for all a G {ai{K) , a2{K)) , there exist Xi{K,a) < 
X2{K,a) < xz{K,a) such that: 

• for all X E [Q,xi{K,a)]U[x2{K^a),xz{K,a)], a random formula Fx {N, N a) 
is x-satisfiable w.h.p. 

• for all X G [xi{K,a),X2{K,a)]U[xs{K,a),l], a random formula Fk{N, N a) 
is x-unsatisfiable w.h.p. 

Let us give a geometrical interpretation of this conjecture. The space of SAT- 
assignments is partioned into non-empty regions whose diameter is smaller 
than Xi] the distance between any two of these regions is at least X2, while 
X3 is the maximum distance between any pair of SAT-assignments. This in- 
terpretation is compatible with the notion of clusters used in the statistical 
physics approach. It should also be mentioned that in a contribution posterior 
to this work 36|], the number of regions was shown to be exponential in the 
size of the problem, further supporting the statistical mechanics picture. 

Conjecture 11.41 can be rephrased in a slightly different way, which decomposes 
it into two steps. The first step is to state the Satisfiability Threshold Conjec- 
ture for pairs: 

Conjecture 1.5 For all K > 2 and for all x, < x < 1, there exists an 
ac{K,x) such that: 

• if a < ac{x), Fk{N, Na) is x-satisfiable w.h.p. 

• if a > ac{x), Fk{N, Na) is x-unsatisfiable w.h.p. 

The second step conjectures that for K large enough, as a function of x, the 
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function ac{K, x) is non monotonic and has two maxima: a local maximum at 
a value xm{K) < 1, and a global maximum at x = 0. 



In this paper we prove the equivalent of Friedgut's theorem: 

Theorem 1.6 For each K > 3 and x, < x < 1, there exists a sequence 
a^i^K, x) such that for all e > 0.■ 

{l zf a = (1 — e)aN(K,x), 
(4) 
zfa = {l + e)aNiK,x), 



and we obtain two functions, aLB{K,x) and auB{K,x), such that: 

• For a > auB{K,x), a random fT-CNF Fk{N, Na) is x-unsatisfiable w.h.p. 

• For a < aLB{K,x), a random i^-CNF Fk{N, Na) is x-satisfiable w.h.p. 

The two functions aLB{K,x) and auB(yK,x) are lower and upper bounds for 
ai\i{K,x) as tends to infinity. Numerical computations of these bounds 
indicate that aN{K,x) is non monotonic as a function of x for > 8, as 
illustrated in Fig. [H More precisely, we prove 

Theorem 1.7 For all e > 0, there exists Kq such that for all K > Kq, 



2^ In 2 

min auB{K,x)<{l + e) — - — , (5) 
xe(o,i) 2 

aLB(i^,0)>(l-e)2^1n2, (6) 
aiB(i^,l/2)>(l-e)2^'ln2. (7) 



This in turn shows that, for K large enough and in some well chosen interval 
of a below the satisfiability threshold etc ~ 2^ In 2, SAT-x-pairs exist for x 
close to zero and for x = ^, but they do not exist in the intermediate x region. 



Note that Eq. ([6]) was established by [18 



In section |2] we establish rigorous and explicit upper bounds using the first- 
moment method. The existence of a gap interval is proven in a certain range of 
a, and bounds on this interval are found, which imply Eq. ([5]) in Theorem II. 71 
Section [3] derives the lower bound, using a weighted second-moment method. 



as developed recently in 17|, ll8|, and presents numerical results. In section H] 
we discuss the behavior of the lower bound for large K. The case of x = | 
is treated rigorously, and Eq. ([7]) in Theorem 11.71 is proven. Other values of x 
are treated at the heuristic level. Section [5] presents a proof of Theorem 11.61 
We discuss our results in section El 
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Fig. 1. Lower and Upper Bounds for aj\[{K = 8, x). The Upper Bound is obtained by 
the first moment method. Above this curve there exists no SAT-x-pair, w.h.p. The 
Lower Bound is obtained by the second moment method. Below this curve there 
exist a SAT-x-pair w.h.p. For 164.735 < a < 170.657, these curves confirm the 
existence of a clustering phase, illustrated here for a = 166.1: solid lines represent 
x-sat regions, and wavy lines x-unsat regions. The x-sat zone near corresponds to 
SAT-assignments belonging to the same region, whereas the x-sat zone around ^ 
corresponds to SAT-assignments belonging to different regions. The x-unsat region 
around .13 corresponds to the inter-cluster gap. We recall that the best refined 
lower and upper bounds for the satisfiability threshold adK = 8) from 14, are 
respectively 173.253 and 176.596. The cavity prediction is 176.543 

2 Upper bound: the first moment method 



The first moment method rehes on Markov's inequality: 
Lemma 2.1 Let X be a non-negative random variable. Then 

P(^ > 1) < E(X) . 



(8) 



We take X to be the number of pairs of SAT-assignments at fixed distance: 
Z{x, F) = J2^ (dar e [Nx + e{N), Nx - e{N)]) 6 [a, re S{F)] , (9) 



where F = Fk{N, Na) is a random i^-CNF formula, and S{F) is the set of 
SAT-assignments to this formula. Throughout this paper 6{A) is an indicator 
function, equal to 1 if the statement A is true, equal to otherwise. The 
expectation E is over the set of random i^-CNF formulas. Since Z{x, F) > 1 
is equivalent to 'F is x-sat isfiable', (E]) gives an upper bound for the probability 
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of x-satisfiability. 



The expected value of the double sum can be rewritten as: 

E{Z) = 2^ E (^^]E[6{a,reS{F))]. (10) 



d£[Nx+e{N),Nx-e{N)]nN 



where a and r are any two assignments with Hamming distance d. We have 
(5 (a, r G S{F)) = He ^ ^ 'S'(c)), where c denotes one of the M clauses. All 
clauses are drawn independently, so that we have: 

E{Z) < (2e(iV) + 1)2^ max | (E [S (<?, f G S{c))]f] , 

(11) 

where we have bounded the sum by the maximal term times the number of 
terms. E [S (<?, r G 5'(c))] can easily be calculated and its value is: 1 — 2^~^ + 
2~-'^(l — x)^ + o(l). Indeed there are only two realizations of the clause among 
2^ that do not satisfy c unless the two configurations overlap exactly on the 
domain of c. 

Considering the normalized logarithm of this quantity, 

F{x,a)= lim ^ In E(Z) = In 2 + if2(a;) + a In (l - 2^-^^ + 2-^(1 - x)^') , 

(12) 

where H2{x) = — xlnx — (1 — x) ln(l — x) is the two-state entropy function, 
one can deduce an upper bound for aN{K,x). Indeed, F{x,a) < implies 
\imN^ooP{Z{x, F) > 1) = 0. Therefore: 

Theorem 2.2 For each K and < x < 1, and for all a such that 

a > a;7R(A,X) = ; — TTT 7^77, (IS) 

u^K , J ln(l -21-^ + 2-^(1 -x)^)' ^ ^ 

a random formula Fk{N, Na) is x-unsatisfiable w.h.p. 

We observe numerically that a 'gap' (xi,X2 and a such that Xi < x < X2 =^ 
F(x, a) < 0) appears for K > 6. More generally, the following results holds, 
which implies Eq. (|5]) in Theorem 11.71 

Theorem 2.3 Let e G (0, 1), and {yK}KeN be a sequence verifying Kyx 
00 and yx = o(l). Denote by H2^{u) the smallest root to H2{x) = u, with 
u G [0,ln2]. 

There exists Kq such that for all K > Kq, a G [(1 + e)2^^^ In 2, aj\f{K)) and 
X G [yK,H2\a2^-^ -\n2- e)] U [1 - /f2~^(a2i-^ - In 2 - e), 1], FK{N,Na) 
is x-unsatisfiable w.h.p. 
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Proof. Clearly (1 + e)2^-Mn(2) < aN{K) since aN{K) = 2^1n(2) - Ok{K) 



18l |. Observe that {1 — yx) = o{l). Then for all 5 > 0, there exists Ki such 



that for all K > Ki, x > i/k- 

auB{x) < (1 + (5)2^-i(ln2 + H^ix)). 
Inverting this inequality yields the theorem. 



(14) 
□ 



The choice of X, although it is the simplest one, is not optimal. The first 
moment method only requires the condition X > 1 to be equivalent to the x- 
satisfiability, and better choices of X exist which allow to improve the bound. 
Techniques similar to the one introduced separately by Dubois and Boufkhad 



14 on the one hand, and Kirousis, Kranakis and Krizanc l^il on the other 



hand, can be used to obtain two tighter bounds. Quantitatively, it turns out 
that these more elaborate bounds provide only very little improvement over 
the simple bound (fT3|l (see Fig. [2]). For the sake of completeness, we give 
without proof the simplest of these bounds: 

Theorem 2.4 The unique positive solution of the equation 



H2(x) + aln 1 - 2 



,1-K 



+ (1 - x)ln 



2 — exp —Ka 



(l-xf) 

21-^- 2-^(1 -x)^-i 



+ xln 



exp 



-Ka- 



1 
-2 



n-K 



l-Ki 



+ 2 



1 — X 



X 







(15) 



1_ 21-^ + 2-^(1-0;)^', 
is an upper bound for aN{K, x). For x = we recover the expression of [IS]. 

The proof closely follows that of [13] and presents no notable difficulty. We 
also derived a tighter bound based on the technique used in , gaining only 



a small improvement over the bound of Theorem 12.41 (less than .001%). 



3 Lower bound: the second moment method 



The second moment method uses the following consequence of Chebyshev's 
inequality: 

Lemma 3.1 If X is a non-negative random variable, one has: 



It is well known that the simplest choice of X as the number of SAT- 
assignments (in our case the number of SAT-x-pairs) is bound to fail. The in- 
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X 

Fig. 2. Comparison between the simple upper bound (fT3|) for a]\f{K = 6,x) (top 
curve) and the refined one (bottom curve), as defined in Theorem 12.41 



tuitive reason p/7|, Il8| is that this naive choice favors pairs of SAT-assignments 
with a great number of satisfying htterals. It turns out that such assignments 
are highly correlated, since they tend to agree with each other, and this causes 
the failure of the second-moment method. In order to deal with balanced (with 
approximately half of literals satisfied) and uncorrelated pairs of assignments. 



one must consider a weighted sum of all SAT-assignments. Following [17|, [18 
we define: 

Z{x,F)=Y.6{d,,^= [Nx\)W{a,T,F), (17) 



where \_Nx\ denotes the integer part of Nx. Note that the condition d^f = 
[Nx\ is stronger than Eq. ([3]). The weights W{a,T,F) are decomposed ac- 
cording to each clause: 



W{a,T,F) = llW{a,T,c), (18) 

c 

with W{a,T,c) = W{u,v), (19) 

where u, v are i^-component vectors such that: -Uj = 1 if the i^^ litteral of c is 
satisfied under a, and Ui = —1 otherwise (here we assume that the variables 
connected to c are arbitrarily ordered), v is defined in the same way with 
respect to r. In order to have the equivalence between Z > and the existence 
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of pairs of SAT-assignments, we impose the following condition on the weights: 



W{u,v) = < 



if u^(-l,...,-l) or v^(-l,...,-l), ^^^^ 
> otherwise. 



Let us now compute the first and second moments of Z: 
Claim 3.2 



E(^) = 2-(^^^j)a(.)-, (21) 



where 



h{x)^E[W{a,r,c)] (22) 
= 2-^^iy(ir,i;)(l-x)l^lx^-l^l. (23) 

u,v 

Here \u\ is the number of indices i such that Ui = +1, and u ■ v denotes the 
vector {uiVi, . . . , ukVk)- 

Writing the second moment is a little more cumbersome: 
Claim 3.3 

HZ') = 2^ E TL^i ^^^^)""' ^24) 

a6VArn{0,l/Ar,2/Ar,...,l}8 1 U=QV^ J ■ 

where 



/2(a) =E[W^(a,T,c)iy (a, f,c) 



K 



:2-^ E W{u,v)W{u',v')X{ 



5{ui=Vi=u[=v\) S{ui=Vi=ulj^v[) 



S{ui=Vi=v'.^u'.) S{(ui=Vi)j^{u',=v',)) S{ui=u'.=v'.j^Vi) 



On 



(25) 



a a 8-component vector giving the proportion of each type of quadru- 
plets {Ti,ai,T-,a'j^) — r being arbitrarily (but without losing generality) fixed 
to (1, . . . , 1) — as described in the following table: 
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QjQ di (I2 Ci^ ^4 ^5 Gf-Q CI7 



Ti 


+ + + + + + 






+ + + + -- 


— — 


< 


+ + -- + + 


— — 


< 


+ -+- + - 


+ - 


is 


a simplex specified by: 






[A^(a4 + 05 + ae + 07)] = 


\Nx\ 


< 


[A^(ai + 02 + ^5 + ae)J = 


\Nx\ 









(26) 



These three conditions fl26|) correspond to the normahzation of the proportions 
and to the enforcement of the conditions dg^f = \_Nx\ , d^/f = \_Nx\ . When 
N ^ 00, V = CIn^n Vn defines a five-dimensional simplex described by the 
three hyperplanes: 



04 + 05 + + 07 = a; 
ai + a2 + + ttQ = X 



(27) 



In order to yield an asymptotic estimate of E(Z^) we first use the following 
lemma, which results from a simple approximation of integrals by sums: 

Lemma 3.4 Let ipi^) be a real, positive, continuous function of a, and let 
Vn, V be defined as previously. Then there exists a constant Co depending on 
X such that for sufficiently large N: 



E 



m 



TV^(a)^ < CoN^/^ I da e^[^«(-)+i'^'^(-)l, (28) 
)! Jv 



aeVArn{i/7V,2/Af,...,i}8 nj=o(^'^i 
where H^{a) = — ^i- 

A standard Laplace method used on Eq. (128!) with ip = 2(/2)" yields: 
Claim 3.5 For each K,x, define: 

<l>(a) = Hs{a.) -In2 - 2if2(a;) + aln/2(a) - 2aln/i(x). 



(29) 



and let ao & V be the global maximum of $ restricted to V. Suppose that 
(9a$(ao) is definite negative. Then there exists a constant Ci such that, for N 
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sufficiently large, 



E(Z)2 
E(Z2) 



>Ciexp(-iV$(ao)). 



(30) 



Obviously $(ao) > in general. In order to use Lemma [3?Tl one must find the 
weights W{u, v) in such a way that maXaev ^{^) = 0. We first notice that, at 
the particular point a* where the two pairs are uncorrelated with each other, 



:i-x) 



Oq — 03 



Cil Qji) Qj'y 



x{l — x) 



X 



4 = 0-1 = (31) 



we have the following properties: 

• Hsis*) = \n2 + 2H2{x), 

• daHsia*) = 0, dlHs{si*) definite negative, 

• fi{xY = /2(a*) and hence <l>(a*) = 0. 

(Note that the derivatives da are taken in the simplex V). So the weights must 
be chosen in such a way that a* be the global maximum of $. A necessary 
condition is that a* be a local maximum, which entails ^3/2 (a*) = 0. 

Using the fact that the number of common values between four vectors 
u, V, u', v' G {—1, 1}^ can be written as: 



- yK + U- V + U- U+ U- V + V ■ u + V ■ V + u ■ v + u ■ v ■ u ■ v' ) (32) 
we deduce from (9a/2(a*) = the condition: 



u,v 



u 



(1 -x)l^la;^-l^l =0, 



(33) 



= K(2x-l] 



^ («,{/)(! - a;) l^lx^-l^l 



u,v 



+ 



+2{2x - 1) 



^iy(u,t;)(l-x)l^la;^-l'^l 



(34) 
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If we suppose that W is invariant under simultaneous and identical permuta- 
tions of the Ui or of the Vi (which we must, since the ordering of the variables 
by the label i is arbitrary), the K components of all vectorial quantities in 
Eqs. ( 155]) . (|51|) should be equal. Then we obtain equivalently: 

^l^(M,t;)(2|M| -ir) (1 -x)l^lx^-l^l = and u ^ v, (35) 
W{u, v){K{2x -l) + u-v){l- x)l^lx^~l^l = 0, (36) 

u,v 



We choose the following simple form for W{u,v): 

if n= (-!,...,-!) or v = (-1, . . . , -1), 

W{u,v) = { _^ (37) 

)^\u\+\v\^\u-v\ otherwise. 

Although this choice is certainly not optimal, it turns out particularly 
tractable. Eqs. fl35l) and fl36l) simplify to: 



[z/(l - x)f-' ={X^ + 1 - 2Xu)(2Xx + z/(l - x)(l + A')) 



[z/(l -x) + Xx f ^ =(1 - Xu)(2Xx + u{l - x)(l + A 



K-1 



K-1 

(3J 



We found numerically a unique solution A > 0, > to these equations for 
any value of > 2 that we checked. 

Fixing (A, z/) to a solution of (138!) . we seek the largest value of a such that 
the local maximum a* is a global maximum, i.e. such that there exists no 
a G V with $(a) > 0. To proceed one needs analytical expressions for fi{x) 
and /2(a). fi simply reads: 



fi{x) = 2--^((l - x)u{l + A^) + 2xA)^ - 2 • 2--^(xA + (1 - x)u) 

+2-^((l-x)z/)^. (39) 

/2 is calculated by Sylvester's formula, but its expression is long and requires 
preliminar notations. We index the 16 possibilities for (««, Vi, u'^, f •) by a num- 
ber r G {0, . . . , 15} defined as: 

1 — Ui 1 — Vi 1 — u'i 1 — Vi , , 

For each index r, define 

/(r) = 6{u, = 1) + 6iv, = 1) + 6{u', = 1) + S{v', = 1), (41) 
n{r) = 6{uiV, = 1) + diu^vi = 1), (42) 



\K 
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and 



,, , , , , ar if r < 7 
ai5_r if r > 8 

Also define the four following subsets of {0, . . . , 15}: Aq is the set of in- 
dices r corresponding to quadruplets of the form {—l,Vi,u[,v'j). Aq = {r E 
{0, . . . , 15} I Ui = —1}. Similarly, Ai = {r \ Vi = —1}, A2 = {r\ u[ = —1} and 
As = {r\vi = -1}. 

Then /2 is given by: 



15 3 ^ Ni^ / 



\j=0 / k=0 \j€Ak j 0<k<k'<3 \j&Ak^\A^^, 



E f E -.1 + f E -.1 • (44) 

0<fc<A;'<fc"<3 \jeAfcnAj,,nAj.,/ / \ieA0nA1nA2nA3 / 



We can now state our lower-bound result: 

Lemma 3.6 Let aj^ G (0, +cxd] he the smallest a such that c)^$(a*) is not 
definite negative. For each K and x G (0, 1), and for all a < aLB{K,x), with 



aLB{K, x) = min 



. In2 + 2/72(3:) -gs (a) 
"+'alVH. ln/2(a)-21n/i(a;) 



(45) 



where V+ = {a E V \ /2(a) > (1/2)}, and where (A, i/) is chosen to be a 
positive solution of l[3^) . the probability that a random formula Fk{N, Na) is 
x-satisfiable is bounded away from as N ^ 00. 

This is a straightforward consequence of the expression fl2^ of $(a). 

Theorem 11.61 and Lemma 13.61 immediately imply: 

Theorem 3.7 For all a < aLB{K,x) defined in Lemma \3. (A a random K- 
CNF formula Fk{N, Na) is x-satisfiable w.h.p. 

We devised several numerical strategies to evaluate aLB^K, x). The implemen- 
tation of Powell's method on each point of a grid of size Af^ (A/" = 10, 15, 20) 
on V turned out to be the most efficient and reliable. The results are given 
by Fig. [1] for A' = 8, the smallest K such that the picture given by Conjec- 
ture 11.41 is confirmed. We found a clustering phenomenon for all the values of 
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> 8 that we checked. In the following we shall provide a rigorous estimate 
oi [K, ^] at large K. 



4 Large K analysis 

4-1 Asymptotics for x = ^ 



The main result of this section is contained in the following theorem, which 
implies Eq. ([7]) in Theorem II .?[ 



Theorem 4.1 The large K asymptotics of aLB{K,x) at x = 1/2 is given by: 

aLB{KM2)^2^\n2. (46) 

The proof primarily relies on the following results: 
Claim 4.2 Let u = 1 and A be the unique positive root of: 

(1-A)(1 + A)'^"'-1 = 0. (47) 
Then (A, z/) is solution to with x = \ and one has, at large K: 

A - 1 ~ -2^-^. (48) 

Lemma 4.3 Let x = \. There exist Kq > Ci > Q and C2 > such that for 
all K > Kq, and for all a e V s.t. |a — a*| < 1/8, 

|ln/2(a) -21n/i(l/2)| < i^^Cila - a*|^2-2^ + Csja - a*|^2~^ (49) 

Lemma 4.4 Let x = ^. There exist Kq > 0, Cq > such that for K > Kq, 
for all a E V , 



|ln/2(a)-21n/i(l/2)| <2 



[ao + fli + a4 + 05)^ + (ao + 02 + 04 + ag)^ 



+ (ao + ai + ae + 07)^ + (^o + 02 + as + o-y)^ 



+ CqK2-^^ 
(50) 



The proofs of these lemmas are defered to sections 14.31 and 14. 4[ 
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4-2 Proof of Theorem 



We first show that d^^{a*) is definite negative for all a < 2^ , when K is 
sufficiently large. Indeed dlHs^a*) is definite negative and its largest eigenvalue 
is —4. Using Lemma [4. 3^ for sl & V close enough to a*: 

<l>(a) < -2|a - a*p + aCi|a - 8i*fK^2-^^ + aCaja - a*|^2-^. (51) 

Therefore 

$(a) < — |a — a*p for K large enough, |a — a*| < — ^ and a < 2^ . (52) 



2C. 



2 



Using Theorem I3.6[ we need to find the minimum, for a G V"+, of 

G(K a) - 31n2-//s(a) 

^^^'"^ = ln/,(a)-21nA(l/2)- ^''^ 

We shall show that 

inf G{K, a) ~ 2^ In 2. (54) 

aeV+ 

We divide this task in two parts. The first part states that there exists R> 
and Ki such that for all K > Ki , and for all a G V"+ such that | a — a* | < 
R, G{K, a) > 2^. This is a consequence of Lemma 14. 3j using the fact that 
3\n2 — HsIsl) > |a — a*p for a close enough to a*, one obtains: 

(55) 

which, for K large enough and a close enough to a*, is greater than 2^. 

The second part deals with the case where a is far from a*, i.e. |a — a*| > R. 
First we put a bound on the numerator of G(a): there exists a constant C3 > 
such that for all a G s.t. |a — a*| > _R, one has 3 In 2 — Hg{a) > C3. 

Looking at Eq. fISUl) . it is clear that, in order to minimize G{K,sl), a should 
be 'close' to at least one the four hyperplanes defined by 

flo + ^1 + 04 + as = 1, flo + «2 + ct4 + cte = 1, 

(56) 

flo + oi + O6 + «7 = 1, flo + «2 + 05 + 07 = 1. 

More precisely, we say for instance that a is close to the first hyperplane 
defined above iff 

ao + ai + 04 + as > 1 - K"^/^ (57) 
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Now suppose that a is not close to that hyperplane. Then the corresponding 
term goes to 0: 

(ao + ai + a4 + a^)^ < (l-R-^/^) ~exp(-v^) asK^oo. (58) 

We classify all possible cases according to the number of hyperplanes a. E V+ 
is close to: 

• a is close to none of the hyperplanes. Then 

G(K, a) > > 2^^ for K large enough. (59) 

4exp(-Vi^') + CoK2-^ 

• a is close to one hyperplane only, e.g. the first hyperplane aQ+ai+a^+a^ = 1 
(the other hyperplanes are treated equivalently) . As I]I=o c'-i = 0, one has 

02 < K~^^^, flg < K-^/'^, ae < K'^^^ aj < K~^/^. (60) 

This implies H^{ai) < 21n2 + 21nK/v^, and we get: 

for sufficiently large K. 

• a is close to two hyperplanes. It is easy to check that these hyperplanes 
must be either the first and the fourth ones, or the second and the third 
ones. In the first case we have Oq + 05 > 1 — and in the second case 
ao + as > 1 — 3/VK. Both cases imply: Hs{a) < ln2 + 3\n K/^/K. One 
thus obtains: 



2^ [2 \n2-3\nK/VK] 
2 + CoK2-^ + 2e-^ 



G{K, Si) > " — ""^"^ ""^ >2^(ln2) l-31ni\:/v^ . (62) 



• One can check that a cannot be close to more than two hyperplanes. 

To sum up, we have proved that for K large enough, for all a. E V^, 

G(fs:,a) > 2^(ln2) [1 -31nir//^] , (63) 

Clearly, ^^^(i^, 1/2) = mig,^v+G{K,a) < auB^K, 1/2). Since from Theorem 
Owe know that ausiK, 1/2) ~ 2^ In 2, this proves Eq. (|5lD. 

4 . 3 Proof of Lemma 

Let a; = I and choose u = 1 and A the unique positive root of Eq. fHTj) . Let 
= cbi — 1/8, and e = (eo, . . . , 67). We expand /2(a) in series of e. The zeroth 
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order term is /2(l/8, . . . , 1/8) = ff{l/2). The first order term vanishes. We 
thus get: 

/2(a) = /2(l/2) + Bo-B^ + B,-B, + B,, (64) 

with 



5o = Ef'^) (^EMA)e. 

q=2 V y / \^ i=0 



1 + A 



4{K-g) 



u=o 



a=l q=2 \ y , 

B,=2-^^j:j:{^\[2rM,e)r 

a=l q=2 \ y / 

i?3 = 2-^^'EEf^)[4..(A,e)P 

a=l q=2 \ 'i J 

54 = 2-^^ E (860)". 

k=2 



n 9 r 



Ij e, 

1 + A 
2 

1 + A 



1 + A 
2 

2{K-q) 



K-q 



(65) 

(66) 
(67) 
(68) 
(69) 



In 5o, Pi{\) = A'(^) + A'(i5~^) - 2 - 4(A - 1). We have used the fact that 
ELo^i = 0- Using + /(15 - i) = 4, one obtains \pi{\)\ < 11(A - 1)^ < 
11 ■ 2^~^^, since |A — 1| < 2^~^ for large enough, by virtue of Lemma [4.21 

In Bi, we have used again X^Lo = 0- ^q* is either or /(15 — i), depending 
on a. In both cases jA^"' — 1| <4|A — 1| < 2^~^. In i?2 and S3, the expressions 
of ra(A, e) and Sa(A, e) are given by: 

ri = eo + A(ei + £2) + A^es, r2 = eo + A(ei + €4) + A^es, 
rs = eo + A(e2 + £4) + A^ee, r^ = eo + A(ei + £7) + A^eg, (70) 
= eo + A(e2 + 67) + A^es, re = eo + A(e4 + 67) + A^es, 



si = eo + Aei, S2 = eo + Ae2, S3 = eo + Ae4, S4 = eo + Xe-^ 



(71) 



In order to prove Lemma [4.31 we will use the following fact: 
Claim 4.5 Let y be a real variable such that \y\ < 1. Then 

K{K -I) 



(72) 



One has \2ra\ < 8|e|, |4sa| < 8|e|, and |8eo| < 8|e|. Therefore, for |e| < l/i 
one can write: 
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|5o| < . 26)22-4^|e|2 + (11 . 26)32-5^|e|3 

|Ei|<4 ^^^~^^ 2^^2-=^^|eP + 22i2-3^|eP 
I II- 2 ' ' ^ ' ' 

\B,\ < Q ^(-^^-^) 262-^^|e|2 + 292-(-i)^|e|=^ for 2 < ^ < 4. 



(73) 
(74) 
(75) 



Observe that 



him 



K 



-K 



1 + 0(7^2-^) 



and that for K large enough, 

/2(a) 



In 

which proves Lemma [4.31 
4 ■ 4 Proof of Lemma 14-4 



mm 



< 



(76) 



(77) 



Note that the bounds on Bq and Bi 0731) . 0741) remain vahd for any e. Therefore 
Bo = 0(2~^^) and Bi = 0(2~^^) uniformly. We bound B3 by observing that: 



B3 =2"^ [(ao + Aai)^ + (oq + Aag)^ + (oq + Xa^)'^ + (oq + Aay 



- 2 



-3X 



E 

a=l 



"l + A" 


^ r 


2 





l + K 



' ?>Sa{\, e) 

, 1 + A 



(75 



Since (oq + Aai) < cto + oi < 1/2 and likewise for the three other terms, one 
has S3 = 0(2~^^) uniformly in a. A similar argument yields B^ = 0(2~^^). 
There remains B2, which we write as: 



K 



0<A;<A:'<3 \jeAfenylj., 



- 2" 



-2K 



E 

a=l 



1 + A 



l + K 



^8r.(A,e) 
.(1 + A)2 



(79) 



The second term of the sum is 0{K2~'^^). The first term is made of six 
contributions. Two of them, namely 2~^(ao + A(ai + 02) + A^os) and 2~^(ao + 
A(a4 + a7) + A^as), are 0(2"^-^), because of the condition on distances. Among 
the four remaining contributions, we show how to deal with one of them, the 
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others being handled similarly. This contribution can be written as: 



\ 00 + 01 + 04 + 05 J 

(80) 

We distinguish two cases. Either oq + oi + 04 + 05 < 1/2, and we get trivially: 
(oo + A(oi + 04) + A^og)^ - (oo + oi + 04 + 05)^ = 0(2"^), (81) 
since both terms are 0(2~^); or Oq + Oi + 04 + 05 > 1/2, and then: 



K 



(oo + A(oi + 04) + A^os)^ - (oo + oi + 04 + 05 



< 



1 + 



(A-l)(0i+04) + (A^-l)05' 
^0 + Cti + O4 + O5 



K 



0{K2 



-K\ 



(82) 



Using again Eq. (176|) finishes the proof of Lemma 14.41 



□ 



4 ■ 5 Heuristics for arbitrary x 

For arbitrary x, the function to minimize in fj^5l) is hard to study analytically. 
Here we present what we believe to be the correct asymptotic expansion of 
aLB{K,x) at large K. Hopefully this temptative analysis could be used as a 
starting point towards a rigorous analytical treatment for any x. 

A careful look at the numerics suggests the following Ansatz on the position 
of the global maximum, at large K: 

00 = 1 - x + 0(1), 06 = a; + o(l) 

01 = o(l) fori 7^ 0,6. ^ ' 

A second, symmetric, maximum also exists around Oq = 1—x, 05 = x. Plugging 
this locus into Eq. fHSl) leads to the following conjecture: 

Conjecture 4.6 For all x G (0, 1], the asymptotics of a lb (x) is given by: 

hm 2-^aMK,x) = ^-^^^^, (84) 

and the limit is uniform on any closed sub-interval of (0, 1]. 

This conjecture is consistent with both our numerical simulations and our 
result at a; = |. 
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5 Proof of Theorem 11.61 



Starting with the sharpness criterion for monotone properties of the hyper- 
cube given by E. Friedgut and J. Bourgain, we wiU prove Theorem 11.61 by 
using techniques and tools developped by N. Creignou and H. Daude for 
proving the sharpness of monotone properties in random CSPs. 

First we make precise some notations for this study on random i^-CNF 
formula over N Boolean variables {xi, . . . ,xn}- A if-clause C is given in 
disjunctive form: C = xl^V . . . Vx^ where G {0, 1} (a;° is the positive literal 
Xi and x] is the negative one Tl). A i^-CNF formula F is a finite conjunction 
of /^-clauses, ^{F) will denote the set of distinct variables occurring in F, 
fi(-F) C {xi, . . . ,xn}- In this Boolean framework, S{F) the set of satisfying 
assignments to F, becomes a subset of {0, 1}^. 

Now, let us recall how a slight change of our probability measure on formulae 
gives a convenient product probability space for studying ^-satisfiability. 

5.1 x-unxatisfiability as a monotone property 

In our case the number of clauses in a random formula Fk{N, Na) is fixed to 
M = Na. We define another kind of random formula Gk{N, Na) by allowing 
each of the Af = 2^(^^^ possible clauses to be present with probability p = 
aN/M . Then, assigning 1 to each clause if it is present and otherwise, the 
hypercube {0, 1}-^ stands for the set of all possible formulae, endowed with 
the so-called product measure /ip, where p is the probability for 1, and 1 — p 
for 0. 

More generally, let A/" be a positive integer, a property Y C {0, 1}-^ is called 
monotone if , for any y,y' G {0,1}-^, y < y' and y implies ?/' G F. In that 
case fip{y G Y) is an increasing function of p G [0, 1] where 

f^piVu ■■■ ,yN)= P'"' ■ (1 - P)-^"'"' where \y\ = ^{l < t < U / = 1). 

For any non trivial Y we can define for every (3 g]0, 1[ the unique pp g]0, 1[ 
such that: 

In our case Y will be the property of being x-unsatisfiable. If we put: 
P = |(a,r) G {0,1}^ X {0,1}^ s.t. dg^e[Nx-e{N),Nx + e{N)]\ iSb) 
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then x-unsatisfiability can be read: 



F eY ^ S{F) X S{F) n I? = 0. 

Observe that the number of clauses in Gk{N, Na) is distributed as a bino- 
mial law Bm{J\f,p = aN/Af) peaked around its expected value p ■ Af = aN. 
Therefore, from well known results on monotone property of the hypercube, 



44 page 21 and Corollary 1.16 page 19], our Theorem II .61 is equivalent to the 
following result, which establishes the sharpness of the monotone property Y 
under /Zp. 

Theorem 5.1 For each K > 3 and x,0 < x < 1, there exists a sequence 

aN{K, x) such that for all ri > 0: 
^ 1 if p ■ Af = {l-ri)aNiK,x)N, 
lim fip{F is X — unsatisfiahle) = < (86) 

\0 zfp-Af = {l+r])aNiK,x)N. 



This theorem will be proved using general results on monotone properties of 
the hypercube. We state these results below without proof. 



5.2 General tools 



The main tool used to prove the existence of a sharp threshold will be a 
sharpness criterion stemming from Bourgain's result 12[| a nd from a remark 
by Friedgut on the possibility to strengthen his criterion 43|, Remark following 
Theorem 2.2]. Thus, a slight strengthening of Bourgain's proof in the appendix 
of 12] combined with an observation made in 39|, Theorem 2.3, page 130] gives 
the following sharpness criterion: 

Theorem 5.2 Let Yx C {0, l}-^ be a sequence of monotone properties, then 
Y has a sharp threshold as soon as there exists a sequence Tj^ with Tj^ D Yj^ 
such that for any j3 g]0, 1[ and every D > 1 the three following conditions are 
satisfied: 

pp = 0(1), (87) 
fxp^iy s.t. 3zeT, zcy, \z\<D) = o(l), (88) 
V^o ^ T, l^ol <D ^p^{yeY,y\zo^Y \ y D zo ) = o(l). (89) 



We end this subsection by recalling t wo g eneral results on monotone properties 
defined on finite sets, established in 4 



Lemma 5.3 14 (a . Lemma A.l, page 236] 
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Let U = {1,...,A^} be partitioned into two sets U' and U" with ^U' = 
M\ = TV" and M = U' +U". For any u d U let us denote u' = unU' 
and u" = uf] U". Let Y C {0, 1}-^ be a monotone property. For any element 
u, let A{u) be the set of elements from U' that are essential for property Y at 
u: A{u) = {i E U' s.t. uU {i} G Y} . Then, for any a > the following holds 



fipiu e Y, u" ^Y)< -rr ■ Hp{u ^ Y, > a) + 



(l_p)A/-' ^ '^-^^ J- J ' (i_p)A/-'- 

For the second result we consider a sequence of monotone properties Yj^ C 
{0, 1}-^. For any fixed u e {0, 1}-'^, Bj{u) will be the set of collections of j 
elements such that one can reach property Y from u by adding this collection, 
thus #S,H<(^). 



Lemma 5.4 Jid, Lemma A. 2, page 237] Let Yj^ C {0, 1}-^ be a sequence of 
monotone properties. For any integer j > I, for any b > and as soon as 
M ■ p tends to infinity, the following estimate holds 

f,plu^Y,#B,{u)>b - l^-^^) =o(l), 
Bj{u) = {{ii, . . . , ij}, 1 < ii < . . . < ij < Af, such that u U {ii, . . . , ij} G Y} . 

5.3 Proof of Theorem \5.1\ (main steps) 



As usual, the first two conditions flHTI) and fl88|) are easy to verify for the 
x-unsatisfiability property. For the first one we have: 

is x-satisfiable) < ^p{F is satisfiable) < 2^(1 

This shows that pp < ^ ^ ^ — , thus for x-unsatisfiability we get: 

\k) 

V/3g]0,1[ pp{N) = 0{N^-^). (90) 



For the second condition, let H[F) be the i^-uniform hypergraph associated to 
a formula F\ its vertices are the ^{F) variables occurring in F, each index set 



of a clause C in F corresponds to an hyperedge. Let us recall, see [45|, that a K- 
uniform connected hypergraph with v vertices and w edges is called a hypertree 
when {K — l)w — w = —1; it is said to be unicyclic when {K — l)w — f = 0, 
and complex when {K — l)w — v > 1. Let T be the set of formulae F such 
that H{F) has at least one complex component. We will rule out (l88l) (and 
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also (l89ll ) by using the following result on non complex formulae, the proof of 
which is deferred to the next subsection: 

Lemma 5.5 Let K > 3. If G is a K-CNF-formula on v variables whose 
associated hypergraph is an hypertree or unicyclic then for all integer d G 
{0, . . . , f } there exits {a, r) G S{G) x S{G) such that d^f = d. 

In particular, this result shows that any a;-unsatisfiable formula has at least 
one complex component, i. e. T D Y. Then observe that there is 0{N^^~^^^~^) 
distinct complex components of size s with vertices. Thus we get for all 
p : iip{F s.t. 3G e T, G C F, \G\ < D) < 0{N^^-^>~^) ■ and ([HHD 

s<D 

follows from (!90l) 



In order to prove fl89l) . let us introduce some tools inspired of 40 . 

For each positive integer t and A = (Ai, . . . , A^) G {0, 1}*, a A-assignment 
is an assignment for which the t first values of the variables are equal to 
Ai, . . . , At. Then Sa{F) will denote the set of satisfying A-assignments to F: 
5a(F)C 5(F) C {0,1}^. 

For any pair of t-tuples (A, A') G {0, 1}* x {0, 1}* we define F^'-^': 

F eY^'^' ^ Sa{F) X Sa'{F) n P^ = 0. 
Observe that Y'^'^' is a monotone property containing Y. 



Now we come back to (1891) with Fq ^ T, so that the hypergraph associated to 
the booster formula Fq has no complex components. S{Fq) ^ and w.l.o.g. 
we can suppose that fi(-Fo) = Then, for F G F such that F D 

Fq with F \ Fq ^ Y, let F" denote the largest subformula of F such that 
Q{F") n = 0. We have the two following claims whose proof is 

postponed to the next subsection. 

Claim 5.6 For any (A, A') G S{Fq) x S{Fq), F \ Fq e Y^'^' . 

Claim 5.7 There exits (A, A') G S{Fq) x S{Fq) such that F" ^ Y^'^' . 

Thus dHHD is proved as soon as for any p g]0, 1[ and (A, A') G {0, 1}* x {0, 1}*: 

/i,,(F\FoGF^'^',F"^r^'^' I FDFo) = o(l). (91) 

The two first events in the R.H.S. of (I9T]) do not depend on the set of clauses in 
Fq thus by independence under the product measure and recalling that Y^''^ 



24 



is a monotone property we are led to prove that: 

^^,^{FeY^'^',F"^Y^'^') = o{l). 

From ([90]) we know that pf^{N) = 0{N^-^). Let A/"' = e{N^-^) be the 
number of clauses having at least one variable in {1, ... , t}, then Lemma [5. 3^ 
applied to the monotone property Y^'^ , shows that the above assertion is 
true as soon as we are able to prove that for all 7 > 0: 

f^P.iF^ r^'^', #^A,A'(F) > 7 • N''^' ) = 0(1). (92) 

where ^a,A'(-^) is the set of if-clauses C on variables having at least one 
variable in {xi, . . . ,Xt} and such that F /\C E Y^'^ . 

Then let i3A,A'(-^) be the set of collections of {K — 1) K-clauses {Ci, . . . , Ck-i} 
such that F A Ci A . . . A Ck-i e F^'^'. From lemma ESI we deduce that 
is true as soon as the following result is proved: 

Lemma 5.8 For all t, K > > and (A, A') G {0,1}* x {0,1}*, there 
exits 6 > such that for all N, the following holds: 

#^A,A'(^) > 7 ■ N""-' =^ #^A,A'(^) > ■ iV^-(^-^). (93) 

Again the proof of this last result is deferred to the next subsection that 
furnishes a detailed and complete proof of Theorem 15. 1[ 

5.4 Detailed proofs 

5.4-1 Lemma \5. 51 

Proof: When G has a leaf-clause, that is a clause C = xl^ V ... V having 
only one variable, say Xi, in common with G\C, the assertion can be proved 
by induction on the number of clauses in G. Indeed from a pair of satisfying 
assignments {a, f) G S{G\C) x S{G\C) with d^^^ = d and a pair of satisfying 
assignments at distance G {0, . . . , i^' — 1} for C" = V . . . V x^, one gets 
a pair of satisfying assignments at distance d + d'. But C' is a. K — 1-clause, 
thus for any d' G {0, . . . ,K — 1} C' has a pair of satisfying assignments at 
distance d'. 

When any AT-clause Ci of G = CiA. . .ACi has exactly two variables in common 
with G\Ci then we can write Ci = xf Vxs' VC^, C2 = xf Vxg^ VC^, . . . , G = 
V Xi^ V C'l where the Cj are {K — 2)-clauses. A variable in Cj occurs exactly 
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once in formula G and the set of variables in these Cj is equal to {xj+i, . . . , x^,}. 
In particular this set is disjoint from the set of variables of the 2-CNF formula 
(xf A (xf Vxg^) A ... A (xf' Vxi^). First observe that this 2-CNF cyclic 

formula has always a satisfying assignment (di, . . . , cx;) and together with any 
truth value for the (xj, j > /) it gives a satisfying assignment for G. Thus, for 
G, one gets a pair of satisfying assignments at distance d for any d < v — I. 
Second, as ft{Gj) nf2(C^) = when j ^ k a satisfying assignment o"z+i, . . . , 
can easily be found for C( A ... A Together with any truth values of the 
{xi,i < I) it gives a satisfying assignment for G. Then, from the satisfying 
assignment (ai, . . . , o";, 1 — cr;+i, . . . , 1 — (Xy) one gets, for any d > v — I, a. pair 
of satisfying assignments at distance d. □ 



5.4-2 Claims 15.61 and \5. 7 

Proof: Observe that any SAT-x-pair (a, r) for F\Fo with {ai, . . . ,at) G S{Fo) 
and (ri, . . . , Tt) G S{Fo) is also a SAT-x-pair for F. This proves the first claim 
by contradiction. 



For the second claim, F \ Fq ^ Y so there exits a SAT-x-pair {a, f) G 
S{F\Fo) X S'(F\Fo). By construction, the set of satisfying assignment of F" 
does not depend on the first t coordinates. Let dt be the Hamming distance 
between (cii, . . .at) and (ti, . . . r^). We know that all components of the hyper- 
graph associated to formula Fq are simple and lemma flS.Sp shows that there 
exits {(t[, . . .cr't) G S{Fq) and (r(, . . . rj.) G 5'(Fo) such that d^'f = dt. Hence 
{a[, . ■ .(y't, Ct+i, ■ ■ ■ , ^n) and (r{, . . . r/, r^+i, . . . , tn) form now a SAT-x-pair for 
F", thus proving the second claim. □ 



5.4.3 Lemma \57R 



Proof: In [42], Erdos and Simonovits proved that any sufficiently dense uni- 
form hypergraph always contains specific subhyper graphs. In particular they 
considered a generalization of the complete bipartite graph specified by two 
integers h > 2 and m > 1. Let us denote by Kh{m) the /i-uniform hypergraph 
with h ■ m vertices partitioned into h classes Vi, ■ ■ ■ , \4 with = m and 
whose hyperedges are those /i-tuples, which have exactly one vertex in each 
Vi. Thus Kh{m) has hyperedges, for = 2 it is a complete bipartite graph 
K{m, m). 

For proving Lemma 15.81 we need a small variation on a result of Erdos and 
Simonovits which differs only in that it deals with ordered /i-tuples as opposed 
to sets of size h. More precisely, let us consider hypergraphs on n vertices, 
say {xi, . . . ,x„}, we will say that two disjoint subsets of vertices A and B 
verify A < B ii for all Xj in A and all Xj in B we have i < j. Let H be 
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an /i-uniform hypergraph with vertex set then any /i-uniform 

subhypergraph Kh{m) with Vi < . . . < V/i is called an ordered copy of Kh{m) 
in H. Thus, the ordered version of the theorem from Erdos and Simonovits 



about supersaturated uniform hypergraphs [4^, Corollary 2, page 184] can be 
stated as follows. 

Theorem 5.9 (Ordered Erdos-Simonovits) Given c > and two integers 
h > 2 and m > 1, there exist c' > and N such that for all integers n > N , if 
H is a h-uniform hypergraph over n vertices having at least c ■ hyperedges 
then H contains at least c'n^"^ ordered copies of Kh{m). 

We will also use the following observation made when one consider an assign- 
ment of two colours, say and 1, to the hyperedges of Kh{m). First let's say 
that a vertex s is c-marked if s belongs to at least one c-colored hyperedge. A 
subset of vertices 5* is said c-marked if any s in S* is c-marked. 

Claim 5.10 Let h > 2, m > 1, and Vi,--- ,Vh the partition associated to 
Kh{m). Consider an assignment of two colours to the hyperedges of Kh{m), 
then at least one of the Vi is marked. 

Indeed, suppose that Vi,--- ,Vh are not c-marked. Now consider a vertex 
s E Vi then s is (1 — c) marked else by construction of Kh{m), Vi would be 
c-marked for all i >2. Hence Vi becomes (1 — c)-marked. 



Now let us show (1931) . in other words that for any i^'-CNF formula F such 
that ^A,A'(-^) is dense then Ba,a'{F) is also dense. For more readability we 
will restrict our attention to the special case i^' = 3, in using the above fact 
the proof will be easily extendable to any > 3. Suppose there exist 6(A^^) 
clauses in ^a,a'(-^) then, by the pigeon hole principle, at least for one of 
the eight types of clause we can find 9(A^^) clauses of this type in Aa,a'{F). 
Suppose, for example, that 



= X- Vx,2 V l<^l<^2<^3<N,^l<t, FAC G Y"^'^'] = Q{N^). 



From well chosen elements in Aa.A'{F) we now exhibit an element in -Ba,a'(-^)- 
We consider the graph H{F) associated to formula F: the set of vertices is 
{1, . . . , A^} and for each C = V Xi^ V x^J G ^a,A'(-^) create an edge 
{12, h}- Let ((?, r) be a SAT-a;-pair for F, then either a ^ S{C) or r ^ S{C). 
Now, following a fixed ordering on the set of pairs of thruth assignments we put 
the colour on the non colored edge {^2, "^3} if 0"i2 = and cjjg = 1 else we put 
the color 1, having in this = and = 1. Now, let's take an ordered 

copy of A'(3,3) in H{F) with partition A = {ji, ^2, ja} and B = {j4, js, je}- 
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From Fact 15.101 we know that one part, say A, is marked. In such a case we 
have (Tjj = 0, (Tj2 = 0, = (A is 0- marked) or r^^ = 0,Tj^ = 0, tj^ = (A is 
1-marked) hence (c?, r) is no longer a SAT-x-pair for F A {xj^ V Xj^ V Xj^). If 
i? is marked then (a, f) is no longer a SAT-x-pair for F A V xj^ V x^). 
Thus in any case {{xj-^ V V Xj^), {xj^ Vx^V x]^)} G -Ba,a'(-^)- 

By hypothesis H{F) is a dense graph so from Theorem 15. 91 we can find Q{N^) 
copies of K{3, 3) in H{F). The above construction provide B(A^^) elements in 
Ba,A'{F) thus proving that this set is also dense. □ 

5.5 A general sharpness result 

Note that the above proof does not use any information about the shape of the 
set T> defining the x-unsatisfiability in terms of a subset of {0, . . . , N}, namely 
the interval [Nx — e{N),Nx + e{N)] (see (|85ll ). Actually we can consider 
properties defined by a non empty proper subset of {0, . . . , A^} and we have 
proved the following general result: 

Theorem 5.11 Let Jn be a non empty subset of {0, ... , N} and consider 
2^j= |(a^,r) G {0,1}^ X {0,1}^ s.t. rf^.^GJjv}. 

Let K >3 and Yj be the set of K- CNF formula defined as: 

F eYj^ S{F) X S{F) n Pj = 0. 

Then, Yj is a monotone property exhibiting a sharp threshlold. 

On one hand, any upper bound for the satisfiability threshold, for instance 
(!90l) . is an upper bound for all Yj threshold. On the other hand, lemma 15.51 
tells us that a non complex formula does not belongs to Yj. Then, from |45[], 
we know that w.h.p a formula whose ratio between the number of clauses and 
the number of varibles is less than 1/K{K — 1), has no complex component. 
Thus it provides a lower bound for all Yj threshold. 



6 Discussion and Conclusion 

We have developed a simple and rigorous probabilistic method which is a first 
step towards a complete characterization of the clustered hard-SAT phase in 
the random satisfiability problem. Our result is consistent with the clustering 
picture and supports the validity of the one-step replica symmetry breaking 
scheme of the cavity method for K > 8. 
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The study of x-satisfiability has the advantage that it does not rely on a 
precise definition of clusters. Indeed, it is important to stress that the "appro- 
priate" definition for clusters may vary according to the problem at hand. The 
natural choice seems to be the connected components of the space of SAT- 
assignments, where two adjacent assignments have by definition Hamming 
distance 1. However, although this naive definition seems to work well on the 
satisfiability problem, it raises major difficulties on some other problems. For 
instance, in g-colorability, it is useful to permit color exchanges between two 
adjacent vertices in addition to single-vertex color changes. In XORSAT, the 
naive definition is inadequate, since jumps from solution to solution can in- 
volve a large, yet finite, Hamming distance due to the hard nature of linear 
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On the other hand, the existence of a gap in the x-satisfiability property is 
stronger than the original clustering hypothesis. Clusters are expected to have 
a typical size, and to be separated by a typical distance. However, even for 
typical formulas, there exist atypical clusters, the sizes and separations of 
which may differ from their typical values. Because of this variety of clus- 
ter sizes and separations, a large range of distances is available to pairs of 
SAT-assignments, which our ^-satisfiability analysis takes into account. What 
we have shown suggests that, for typical formulas, the maximum size of all 
clusters is smaller than the minimum distance between two clusters (for a 
certain range of a and ii' > 8). This is a sufficient condition for clustering, 
but by no means a necessary one. As a matter of fact, our large K analysis 
conjectures that a\{K) (the smaller a such that Conjecture 11.41 is verified) 
scales as 2^~^ln2, whereas ad(-ft') (where the replica symmetry breaking oc- 
curs) and as{K) (where the one-step RSB Ansatz is supposed to be valid) 
scale as 2^\iiKjK |22|]. According to the physics interpretation, in the range 
as{K) < a < ai{K), there exist clusters, but they are not detected by the 
^-satisfiability approach. This limitation might account for the failure of our 
method for small values of K — even though more sophisticated techniques 
for evaluating the x-satisfiability threshold ac{K,x) might yield some results 
for K < S. Still, the conceptual simplicity of our method makes it a useful tool 
for proving similar phenomena in other systems of computational or physical 
interest. 



A better understanding of the structure of the space of SAT-assignments could 
be gained by computing the average configurational entropy of pairs of clusters 
at fixed distance, which contains details about how intra-cluster sizes and 
inter-cluster distances are distributed. This would yield the value of the x- 
satisfiability threshold. Such a computation was carried out at a heuristic level 
within the framework of the cavity method for the random XORSAT problem 
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This paper, signed in alphabetic order, is based on previous work by Mora 
Mezard and Zecchina reported in Sec. 1-4, 6. The proof in Sec. 5 is due to 
Daude. 
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